Skip to content

[fix][broker]Do not trigger topic GC if replication is still active#25915

Open
poorbarcode wants to merge 10 commits into
apache:masterfrom
poorbarcode:fix/topic_gc_replication
Open

[fix][broker]Do not trigger topic GC if replication is still active#25915
poorbarcode wants to merge 10 commits into
apache:masterfrom
poorbarcode:fix/topic_gc_replication

Conversation

@poorbarcode

@poorbarcode poorbarcode commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Motivation

Replication is stuck due to the Topic GC

  • Enable binary way replication between the primary cluster and backup cluster
  • primary cluster: Publishing messages into primary cluster
    • backup cluster: No consumer/producer is registered.
  • backup cluster: Check topic GC
    • No subscriptions
    • No producers except remote producer
    • The topic should be GC
      • Disable replication
      • The topic will not be deleted since the remote-side producer is still registered
  • backup cluster: the topic GC progress is waiting for the remote-producer to be disconnected.
    • It will not be executed since no one wants to delete the topic.
  • backup cluster: backlog increases because the replicator was closed
    • Although the messages copied from the remote end will not be copied back repeatedly, the replicator still needs to perform a check and then mark delete.

Modifications

  • Topic GC will only be triggered if there is no producer(includes remote producer) and no replicator
  • Replicator producer will be closed after there are no messages to be replicated anymore.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

@poorbarcode poorbarcode added this to the 5.0.0-M1 milestone Jun 1, 2026
@poorbarcode poorbarcode self-assigned this Jun 1, 2026
@poorbarcode poorbarcode added type/bug The PR fixed a bug or issue reported a bug release/4.2.2 release/4.0.11 ready-to-test labels Jun 1, 2026

@codelipenghui codelipenghui left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poorbarcode I'm thinking another solution

  1. Close the replicator producer if the producer is idle for a while (e.g. 10 mins)
  2. Check all the producers for detecting the inactive topic (now it's more complicated to skip the replicator producer)
  3. Only delete the inactive topic if there is no producers (including the replicator producer)

Now, your solution added 7 days delay for inactive topic deletion if geo-replication enable. But if there is no messages from last 7 days, the issue can still happen, right?

Comment thread pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractTopic.java Outdated
@void-ptr974

Copy link
Copy Markdown
Contributor

@poorbarcode I'm thinking another solution

  1. Close the replicator producer if the producer is idle for a while (e.g. 10 mins)
  2. Check all the producers for detecting the inactive topic (now it's more complicated to skip the replicator producer)
  3. Only delete the inactive topic if there is no producers (including the replicator producer)

Now, your solution added 7 days delay for inactive topic deletion if geo-replication enable. But if there is no messages from last 7 days, the issue can still happen, right?

I agree with this concern. A replicated topic can be quiet for longer than the threshold while the replication relationship is still valid.

Using latestPublishTime here makes topic deletion depend on the traffic pattern instead of the producer lifecycle. It seems cleaner to let the replicator close its producer explicitly when it is really idle, and let topic GC only check whether producers still exist.

@poorbarcode

Copy link
Copy Markdown
Contributor Author

@codelipenghui @void-ptr974 Changed the solution as @codelipenghui suggested, please review again

Comment thread pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractTopic.java Outdated
@poorbarcode poorbarcode requested a review from lhotari June 3, 2026 08:58
Comment thread pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractTopic.java Outdated
Comment thread pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractTopic.java Outdated
Comment thread pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractTopic.java Outdated
@poorbarcode poorbarcode requested a review from void-ptr974 June 11, 2026 09:47
@poorbarcode

Copy link
Copy Markdown
Contributor Author

@codelipenghui @lhotari @void-ptr974

Please review the PR again. I have changed the solution:

  • Topic GC will only be triggered if there is no producer(includes remote producer) and no replicator
  • Replicator producer will be closed after there are no messages to be replicated, it does not rely on the Persistent Topic anymore

@poorbarcode poorbarcode requested review from codelipenghui, lhotari and void-ptr974 and removed request for codelipenghui, lhotari and void-ptr974 June 11, 2026 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-test release/4.0.12 release/4.2.3 type/bug The PR fixed a bug or issue reported a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants